116 research outputs found
Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation
Large language models (LLMs) have brought significant advancements to code
generation, benefiting both novice and experienced developers. However, their
training using unsanitized data from open-source repositories, like GitHub,
introduces the risk of inadvertently propagating security vulnerabilities. To
effectively mitigate this concern, this paper presents a comprehensive study
focused on evaluating and enhancing code LLMs from a software security
perspective. We introduce SecuCoGen\footnote{SecuCoGen has been uploaded as
supplemental material and will be made publicly available after publication.},
a meticulously curated dataset targeting 21 critical vulnerability types.
SecuCoGen comprises 180 samples and serves as the foundation for conducting
experiments on three crucial code-related tasks: code generation, code repair
and vulnerability classification, with a strong emphasis on security. Our
experimental results reveal that existing models often overlook security
concerns during code generation, leading to the generation of vulnerable code.
To address this, we propose effective approaches to mitigate the security
vulnerabilities and enhance the overall robustness of code generated by LLMs.
Moreover, our study identifies weaknesses in existing models' ability to repair
vulnerable code, even when provided with vulnerability information.
Additionally, certain vulnerability types pose challenges for the models,
hindering their performance in vulnerability classification. Based on these
findings, we believe our study will have a positive impact on the software
engineering community, inspiring the development of improved methods for
training and utilizing LLMs, thereby leading to safer and more trustworthy
model deployment
The Effect of Long-Term or Repeated Use of Antibiotics in Children and Adolescents on Cognitive Impairment in Middle-Aged and Older Person(s) Adults: A Cohort Study
Objectives: We evaluated the effects of long-term/recurrent use of antibiotics in childhood on developing cognitive impairment in middle and old age from UK Biobank Database.
Methods: UK Biobank recruited participants aged 37–73 years. Cognitive impairment was ascertained by fluid intelligence questionnaire. Primary outcome was the occurrence of cognitive impairment in middle and old age. Multivariate logistic regression models were used to explore the relationship between long-term/recurrent use of antibiotics and cognitive impairment.
Results: Over 3.8–10.8 years’ follow-up, 4,781 of the 35,921 participants developed cognitive impairment. The odds of cognitive impairment in middle and old age among long-term/recurrent use of antibiotics in childhood were increased by 18% compared with their counterparts (adjusted odd ratio 1.18, 95% confidence interval 1.08–1.29, p < 0.01). The effect of long-term/recurrent use of antibiotics in childhood on cognitive impairment was homogeneous across different categories of various subgroup variables such as sex, age, APOE4, ethnic groups, income before tax, smoking status, alcohol status, BMI, hypertension and diabetes but the effect of long-term/recurrent use of antibiotics in childhood was modified by the educational qualification (p-value for interaction <0.05).
Conclusion: Long-term/recurrent use of antibiotics in childhood may increase the risk of cognitive impairment in middle and old age
Multi-Granularity Detector for Vulnerability Fixes
With the increasing reliance on Open Source Software, users are exposed to
third-party library vulnerabilities. Software Composition Analysis (SCA) tools
have been created to alert users of such vulnerabilities. SCA requires the
identification of vulnerability-fixing commits. Prior works have proposed
methods that can automatically identify such vulnerability-fixing commits.
However, identifying such commits is highly challenging, as only a very small
minority of commits are vulnerability fixing. Moreover, code changes can be
noisy and difficult to analyze. We observe that noise can occur at different
levels of detail, making it challenging to detect vulnerability fixes
accurately.
To address these challenges and boost the effectiveness of prior works, we
propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from
prior works, Midas constructs different neural networks for each level of code
change granularity, corresponding to commit-level, file-level, hunk-level, and
line-level, following their natural organization. It then utilizes an ensemble
model that combines all base models to generate the final prediction. This
design allows MiDas to better handle the noisy and highly imbalanced nature of
vulnerability-fixing commit data. Additionally, to reduce the human effort
required to inspect code changes, we have designed an effort-aware adjustment
for Midas's outputs based on commit length. The evaluation results demonstrate
that MiDas outperforms the current state-of-the-art baseline in terms of AUC by
4.9% and 13.7% on Java and Python-based datasets, respectively. Furthermore, in
terms of two effort-aware metrics, EffortCost@L and Popt@L, MiDas also
outperforms the state-of-the-art baseline, achieving improvements of up to
28.2% and 15.9% on Java, and 60% and 51.4% on Python, respectively
Discriminative analysis of schizophrenia patients using graph convolutional networks: A combined multimodal MRI and connectomics analysis
IntroductionRecent studies in human brain connectomics with multimodal magnetic resonance imaging (MRI) data have widely reported abnormalities in brain structure, function and connectivity associated with schizophrenia (SZ). However, most previous discriminative studies of SZ patients were based on MRI features of brain regions, ignoring the complex relationships within brain networks.MethodsWe applied a graph convolutional network (GCN) to discriminating SZ patients using the features of brain region and connectivity derived from a combined multimodal MRI and connectomics analysis. Structural magnetic resonance imaging (sMRI) and resting-state functional magnetic resonance imaging (rs-fMRI) data were acquired from 140 SZ patients and 205 normal controls. Eighteen types of brain graphs were constructed for each subject using 3 types of node features, 3 types of edge features, and 2 brain atlases. We investigated the performance of 18 brain graphs and used the TopK pooling layers to highlight salient brain regions (nodes in the graph).ResultsThe GCN model, which used functional connectivity as edge features and multimodal features (sMRI + fMRI) of brain regions as node features, obtained the highest average accuracy of 95.8%, and outperformed other existing classification studies in SZ patients. In the explainability analysis, we reported that the top 10 salient brain regions, predominantly distributed in the prefrontal and occipital cortices, were mainly involved in the systems of emotion and visual processing.DiscussionOur findings demonstrated that GCN with a combined multimodal MRI and connectomics analysis can effectively improve the classification of SZ at an individual level, indicating a promising direction for the diagnosis of SZ patients. The code is available at https://github.com/CXY-scut/GCN-SZ.git
- …